Youtube online news: News on Youtube Jul 27 2018

Hi Friends! Welcome to Deepas Adukkala (Kitchen)

Today I am going to show you SAMBAR

Amaranth leaves Sambar Curry

with some vegetables

too many vegetables are not required

Children will love to have this

because they usually don't eat Amaranth leaves in other forms

This recipetastes sour,sweet and spicy, in short it is very tasty

everyone will love this recipe

let us get started

to get the notification of my new videos

please subscribe and hit that bell icon...

Ingredients required for this recipe

Amaranth leaves, washed and cut in small pieces as shown

Split Pigeon Peas (Toor Dal) - 50 gm

wash and soak in water (just for few minutes until you finish cutting Amaranth leaves)

one chopped onion

even shallots can be used

here I have taken medium size onion

Tomato - 1

Green Chilli 3

Garlic cloves - 7 (one whole garlic)

Red Chilli powder - 2 teaspoon

Coriander powder - 1.25 teaspoon

Aafoetida powder - 0.5 teaspoon

Fenugreek powder - 0.25 teaspoon

Turmeric powder - 0.25 teaspoon

Tamarind size of gooseberry soaked in water

mustard seeds

Curry leaves, Coriander leaves and two Dry Red Chilli

cooking oil

these are the ingredients required for this recipe

let us see

Add to pressure cooker - chopped Amaranth leaves

Split Pigeon Peas

onions

tomato

garlic

green chilli

one teaspoon salt

let us add water

I added 700 ml water (two and a half cup)

mix properly

close the lid

place it on flame

place the weight (whistle)

we will wait for four whistle

[after four whistle]

let the steam escape from cooker

steam has gone

let us open the lid and see

everything is cooked properly

let us keep this stuff aside and then

place a pan on flame

pan is hot, let us add some cooking oil

oil is hot

add half teaspoon mustard seeds

mustard seeds are sizzling, add dry red chilli

add curry leaves and

coriander leaves

add coriander powder

add red chilli powder

saute

Note# put the flame in low flame, do not burn the spices

we need to get rid of the raw smell

and add tamarind juice (earlier soaked in water)

saute

now let us add the cooked stuff (Split Pigeon Peas+amaranth leaves mix)

add turmeric powder

we need to boil the stuff, increase flame

add Aafoetida powder

let me check if salt is okay...

little salt is required

earlier we added salt

but now I am adding half teaspoon salt

now its boiling, lower the flame

now we have enough gravy

it is because we added more water while cooking peas and leaves

if not you can add hot water at this stage

I have enough gravy that is why am not adding more water

may become thick later because of the presence of peas

add Fenugreek powder

this recipe is very healthy.

has got peas and leaves

we have not added coconut

this recipe has got different taste

everyone will love this recipe

my family loves this recipe

in this recipe we can add more tomatoes

I had only one tomato in my kitchen lol

our Cheera Sambar is ready!

have it with rice - Sooooper curry!

everyone should try

please leave feedback

let me know if you like it or dislike it

if I have done any mistakes

let me know your suggestions

let us serve

How was Cheera Sambar?

if you liked the recipe then please

like, share and subscribe

please leave comments below

meet you with a new recipe until then bye.. Thank You!

For more infomation >> ചീര സാംബാർ | Simple and Tasty Cheera Sambar | Quick and Easy kerala Style Sambar-Cheera/Keera Sambar - Duration: 5:51.

-------------------------------------------

The Simple Essence of Automatic Differentiation - Duration: 1:30:45.

>> I'm Daan and I'm honored to host today Conal Elliott,

former MSR colleague and a good friend of mine.

Conal used to work here at Microsoft Research until 2002,

and now he's a distinguished scientist at the target.

But the way he was here at Microsoft research,

I was so busy students and I remember being so inspired

by his beautiful work on functional reactive animations,

and at that time was really inspirational to me.

One thing that Conal is great at

is taking something really complicated.

In that time like animations you will always have

matrixes and frames and timers,

and he kind of rephrase it in a beautiful way

about the essence of animations as functions of time.

I think with this work he applies the same trick again

but now in terms of differentiation and machine learning,

it's always about all these complicated things,

and you almost don't see the essence,

and this talk will hopefully shed some light on it.

So, thanks a lot, Conal.

>> Thanks, Don. It's great to be here.

Yes, I was at Microsoft Research

in the graphics group from

'94 to 2002, I had a good time.

So, this is my first time back giving a talk.

So, anyway, we're going to talk about

automatic differentiation which has

to do roughly with computing derivatives automatically.

So, a good place to start

would be a question, what's a derivative?

All right, this is like the heart of the matter.

If we don't know what this means

what's kind of hopeless to have

any clean rigorous illuminating story

about automatic differentiation.

So, if you had an education like me,

you probably, let's say,

I took calculus one in high school and I learned

a derivative of a function at point is a number,

and that number has to do with the ratio of how fast

the output of the function is growing

relative to the input at that point.

Sounds kind of calculus one perspective.

But a little later, turns out well

it might not be a number, it might be a vector.

So, if my function yields a higher dimensional result,

not a scaler, not just like reals,

we can get a vector out.

But the function could instead consume

a higher dimensional result in which case

we get a different kind of vector.

These two are often not

distinguish which leads to all kinds of

confusions and bugs in

software but they're really importantly different.

They look alike but they mean

radically different things though, vector co-vector.

Then, a little later,

if we have higher-dimensional domain and co-domain,

input and output of function in

which case we're going to have to have

a matrix of partial derivatives

which is sometimes called the Jacobian matrix.

So, maybe we've arrived at the fully general story here,

a higher dimensional domain and co-domain but,

well, there are higher derivatives.

A higher derivative is

a derivative of derivative that's derivative two.

So, higher derivative is going to be even

higher dimensional than two.

For instance, so you get these kind of arbitrarily

high dimensional tensors column.

So, anyway, and then,

we have all these different notions of

derivatives and every

one of those notions has its own chain rule,

they're all different from each other.

Of course, when you see this many variations

there probably going to be other variations as well.

So, we get a complicated incomplete story,

and this isn't going to go well for a calculus student,

it's not going to go well for

computer scientitists or software engineers.

So, we need a better story. So, let's start over.

Instead of saying that the derivative of a function is

a number of vector co-vector

matrix or some kind of higher-dimensional beast,

there's one general beautiful unifying story

which is the derivative is a linear transformation.

That's the general story.

So, I'm gonna talk about D. D is the differentiator.

So, differentiation is a function,

I'm going show you some notation.

If it's unfamiliar or you're confused,

Pauleen interrupt me and ask

questions or about anything else.

I enjoy interaction during the talk,

then they'll be time for questions at the end as well.

So, this is a bit haskell in notation,

so this means that D has this as the following type.

So, differentiation consumes a function from a to b,

and it yields a function from a to

linear transformations from a to b.

That's the general notion of a derivative.

All of these other notions are

representations of linear maps.

All of the chain rules are

implementations of the same thing which is,

which I'll show on the next slide.

So, here's the general definition of a derivative.

Which is, we're going to take a function f at a,

and we're going to perturb the input a little bit,

and sample f but we're also just going to take

the result and perturb it a little bit linearly.

That's the D of F, is going to be

a linear transformation D, f, a is.

So, we feed Epsilon in to this linear transformation.

So, this is the general definition of derivative.

Works for all dimensions,

it works for things other than

linear vector kinds of things.

It just need to be vector spaces

and a little bit more limits have to exist.

So, I learned perspective as an undergrad

from a book called Calculus on Manifolds,

Michael Spivak, 1965.

It's a beautiful little gem of a book.

It gets right to the heart

of what differentiation is about.

Now, as I mentioned there were

all these different representations of derivatives,

each of them have their own chain rule.

But really those chain rules all mean the same thing.

They all have some fun with multiplication,

you can multiply two scalars together,

you can multiply two vectors

in one sense and you get a

dot product in a different sense,

you get what's called the outer products

of inner and outer products,

or if they're matrices, you get matrix products.

But all of those notions of products

represent the same thing which is

composition of linear maps.

So, the one unifying

general chain rule is the following: It says,

"The derivative of a composition of two functions,

g compose f at a is, well,

we're gonna take the derivative of f at a,

and the derivative of g at f of a,

and fa, the derivative of g at f of a,

and I'm going to compose them.

So, normally you would see instead of composition,

you'd see some kind of multiplication here.

But a multiplication is there because

it correctly implements the composition.

But the real one single story is composition. Yes.

>> Would you go one bit when

you showed the linear function.

>> Yes.

>> That you're saying like,

normally that's what you get as a or we can implement

linear met like a matrix

or something representational that's function.

>> Yes, exactly. So, abstractly

we have a linear transformation from a to b,

concretely, it might have

a matrix that represents the linear transformation.

But the important thing is, if you think in terms of

representations, is that a matrix?

Is that a vector? Is that this kind of vector

or the other kind of vector and so on?

Normally like linear algebra packages,

software libraries will have

some matrix package and it has

all these unsafe operations,

like you multiply two matrices

and well, they have to conform,

the width of one has to be the

same as the height of the other and so on,

and the type systems usually don't give that to you.

But if you say linear transformations,

the type system gives you the co-domain.

You cannot have a type correct program

that tries to multiply matrices that don't fit.

For instance, moreover by talking about

it abstractly and this is really key to this talk,

we can innovate and use different representations,

not a matrix representation because

the matrix representation actually turned

out to be a bad idea even

though it's just about universally adopted.

So, this form

of competition is sequential composition.

But there's another kind which is parallel composition.

So, if you have two functions and

the output of one

matches the input of the other you

can sequentially compose them.

But there's another form which is you

have two functions and they have

the same domain and have the same input.

So, one takes you from a to c,

the other takes you from a to d. You can fuse

those functions together into one that

gives you the pair of results.

It's a very simple definition, we call this fork.

So, the fork of f and g when you apply it to a,

where you give a to each of the functions

and you pair up the results.

So, logically, this is parallel.

When I say parallel, I don't mean operationally,

I don't mean that runs in parallel.

It's a good idea for it to run in

parallel but that's a separate issue.

I mean semantically, there's no data dependencies.

Where sequential composition is

all about data dependencies,

parallel composition is free of them.

So, these two forms of competition are both important.

It's something that the differentiation, especially,

automatic or symbolic differentiation,

it's said that these

methods are all about the chain rule.

There not all about the chain rule,

the chain rule is only about sequential composition.

They have other clumsy ways so they

typically talk about parallel composition.

But we can talk about them really elegantly

in terms of this one notion,

and we got a lot of mileage out of

making these kind of single general definition.

Just go back to sequential composition, the chain rule,

you probably learned in

high school except in this generalized form.

So, the problem with the chain rule,

and if we're thinking about how to implement something.

Want to do an implementation, want to

be able to reason about it,

wanted to be correct

and optimize, all that kind of thing.

Well, it's a huge help

for the algorithm to be compositional.

What I mean by composition

is whatever operation on performing,

like differentiation of a composition,

I form by taking the derivative of each of

the two functions and then somehow combining them.

All right? Well, the chain rule doesn't work that way.

The chain rule isn't compositional,

and then if you look at the definition here,

the derivative of a composition is,

well, it involves a derivative of f and g,

but it also involves f itself.

So, this is going to be kind of awkward to

talk about it in terms of

reasoning and implementing compositionally.

But there's a simple fix.

The fix is instead of computing just the derivative,

so you get this function of a to b,

and we get back a function from a

that not only compute this derivative

but also the original value, and that's it.

This enhanced specification is compositional

because when you write out the implications of it,

computing the function and its derivative for f

and g involves just f and g and its derivatives.

>> So you mean the b on the right side is

the same as the b computed above the function?

>> Exactly.

>> The left- hand side?

>> Exactly. The left-hand side?

>> Both that function.

So, taking g and b,

is that b the same b?

>> Yes.

>> Okay.

>> Exactly. So all we're doing,

and here's the definition specification,

so I'm using this fork operator,

I didn't need to do that.

So, says that this enhanced derivative of

f is f itself and

the derivative of f. That's all

just don't forget f, that's all this is saying.

This one fix makes differentiation compositional,

and automatic differentiation does this.

You can find this any automatic differentiation

because you have to to make it compositional. All right.

So, there's one other really

lovely result from this perspective,

this calculus on manifolds,

or derivatives is linear maps,

and that's the following: The derivative of

a linear function is itself everywhere.

So, that's may sound weird.

But if you think about what derivatives really mean,

derivatives are all about linear approximations.

So, when you have the derivative of function

at a point, what you're saying is,

well, we know what the value of the function is.

But if we also know what the first

derivative so that like the tangent at that point,

that tangent is linear

and it's a local linear approximation.

At that point, it's exactly correct.

As it moves away from the point,

it's slightly incorrect and further you get

away it's more and more incorrect.

If you want to get a closer approximation

you could use a first and second derivative,

then you get some kind of local quadratic approximation.

It's going to be a much better approximation and so on.

But for differentiation, the first derivatives,

differentiation is all about

making linear approximations.

So, when I say the derivative of

a linear function is itself everywhere,

all I'm saying is that linear functions are

their own perfect linear approximations.

It's really like nothing, it's not completely obvious.

Now, this is something you know without knowing it.

If you say like what's the derivative

of x with respect to x,

you'd say it's one,

or 2x with respect to x, you say it's two.

So, but I just told you know

the derivative is the function itself.

That's because when we say the derivative of

the function x goes to x is one,

it's not one it's the linear transformation

that multiplies by one.

Which is the same as the identity function.

So, the derivative of the identity is the identity,

is the general way of saying it.

Sometimes it looks like if

you have scalar functions you might say it's one,

if you have non-scalar functions

you might say it's the identity matrix.

The identity matrix, so the

matrix that's one's on the diagonal,

zeros elsewhere, that's just a

representation of the identity function.

So, similarly, the first and second,

these are functions that take a pair

and give you the first or second element.

Those are linear also. So, those functions are

their own derivatives also everywhere.

In general, they could take the derivative

of a linear function f at a,

you get more f of a and f itself,

that's the derivative. All right.

So, now I've really shown you

automatic differentiation, it's all here.

If you take primitive operations, addition,

multiplication, sine, cosine and so on,

and we know what their derivatives are.

Just look them up or remember them,

or prove them analytically.

Then, you start composing things in

parallel and in sequence.

Then, we have all the linear functions

which is the easy case.

This is all there you start automatic differentiation.

So, if you read the literature

especially what's called reverse mode

which is what's used in Machine Learning,

extremely complex.

But it's because it has all this historical baggage.

It's because it was invented a

long time ago in the '50s or '60s,

and then reinvented by Machine Learning people,

rediscovered in terms of

something called back propagation,

but wasn't from this kind of heritage of

crisp mathematical specifications

and rigorous development.

So, if you get rid of

all the kind of historical accidental baggage,

graphs, and variables, and layers,

and things like that, the essence

really is just what I've shown you,

and we can now put these pieces together

into automatic differentiation, and then we're done.

Not really because I'm going to

show you some improvement.

So, what I want to talk about here now

is a nice way to package up this information.

So, I said we have linear functions

and they'll be some primitives,

sine, cosine and so on, and

then sequential and parallel composition.

Now, those are the basic building blocks,

those building blocks have

a mathematical history to them.

We can take those operations,

you can take that identity and composition,

there's an algebraic notion is called the category,

I don't expecting anybody to category theory background,

it's just the idea of I've got

some thing which you can think of

the function like thing.

I means there's a domain and co-domain.

It doesn't really have to be a function like

thing but you can think of it that way.

So, that's going to be this guy, so this

will be something like functions,

and has two things, identity and composition.

In the identity map something to itself,

composition takes something from b to c

and a to a and gives you an a to c. Okay.

So, this is a generalization of the idea

of the identity function and composition on functions.

There are certain laws, this is algebra,

so there's certain laws which is for here

just that the composition is associative,

and the identity is the right and left

identity for composition. That's all.

So, anything that satisfies,

it has these kind of operations

satisfies the clause is a category,

that's what category means.

But now, there are some additional operations.

So, in addition to category,

there is this more specialized notion

of a Cartesian category,

and all that means is there's a it has a notion

of products of pairs.

There are three operations: One is extract left,

other is extract right,

that's what it's going first second before.

So, given a pair, you can get the first element.

Given a pair, you can get second element,

and then there's fork which is two of

these funny arrows with the same domain,

and you get another one with

that domain but pairs in result.

So, I've talked about

these operations on functions and I'm just saying,

there's actually a wider context

not just mathematical functions are

computable functions as we use

in computer science to software engineering.

This will be really important

because automatic differentiation is

all about one of these things

that's not exactly a function,

it's a differentiable function.

It's a function that computes with derivatives as well,

Then, there'll be some other interfaces.

So, this is Haskell

looking stuff but I hope you get the idea.

It's talking about two different interfaces.

One specializes the other,

so that makes additional restrictions. They have laws.

The laws here say things like if you take two of

these arrows, you combine them,

and then you follow new compose with say,

extract left, then you get up the original one.

Exactly. I think there is nothing is surprising at all.

So, now I've shown

you all the building blocks of automatic differentiation,

and then I've shown you an interface,

which is a common interface not

specific to automatic differentiation.

It's one that come out of algebra

particularly in the '50s I think, 1950s that is.

Now, I'm just going to put all the pieces together,

and there's a Haskell program

if you don't understand what something means,

please interrupt. All right.

So, we're going to go back to our specification

and I've tweaked it a little bit.

So, I said we're going to define this type.

It's a type of differentiable functions from a to b.

So, that's D a b, and this is a Haskell.

It isn't just says that one of these guys,

differentiable function from a to b

has this representation.

It's a function from a to a pair of

b and a linear map from a to b.

Jack, what I showed you before,

but now I've just made this

little D thing here which is just a wrapper.

It's just something that says,

the type of this thing is

not the underlying representation,

it's just more kind of abstract notion.

Now, here's our specification.

So, this d hat,

we're going to take a function from a to

b and give you a D from a to b.

What does it do? Just what it did before.

It takes fn as its derivative,

combines them into one

but now we wrap it up with a D. So,

there's just a little bit of

this programming language wrapping.

I want to emphasize here,

the specification is not computable.

So, given an arbitrary computable function,

even if it's continuous,

even if it's differentiable in a mathematical sense,

it's not computable to construct its derivative.

So, that's not a shallow result.

So, one of the things that I really like about this,

what I'm sharing with you today,

is that we started with extremely simple,

compelling specification.

I'm saying we just take the function and its derivative,

mathematical derivative, not computable derivative.

Simple specification, and then,

we're going to go through a process of, well,

I'll show you the outline of it,

you can read the paper for details,

of transforming the specification into an implementation.

So, we get something that's not computable,

very simple and compelling,

we end up with something that's efficiently computable,

and not only that, but it's automatable.

That is the transformation.

So, here's the specification.

So, what we're going to do is automatic differentiation.

I'm going to organize the code

as instances of these classes,

category, and Cartesian, and a few other things for sine,

and cosine, and so on.

The specification is to say that

this not computable function D hat

preserves the category and

Cartesian structure. Here's what I mean.

It's just these five equations.

Two from the category interface

and three from the Cartesian interface.

The Cartesian just means about products.

Category just means about identity and composition.

So, what we're saying is that this D hat

of the identity is the identity.

D hat of g compose f is

the composition of D hats of g and

f. The left extractor goes

to itself in

the other category, that's the important thing.

On the right hand side, we see this id functions,

that id is differentiable function does

D. This composition is on functions,

this one is on differentiable function on D and so on.

So, the derivative of the fork

is the fork of the derivative and so on.

So, this is extremely simple specifications

of very compelling form.

What that means is, you can really

think about differentiable functions

just in terms of the specification.

That any intuition you

form about what does it mean when I'm composing

these two things is captured

precisely and elegantly by these five requirements.

So, these requirements are going to be the specification,

and then the trick is to derive

a correct implementation that's

computable to the actual implementation.

That's the game. Yes.

>> So, [inaudible] is a functor.

>> Yes.

>> It's a functor?

>> Yes it is a Cartesian functor.

>> Okay.

>> Yes. So, more generally it's a homomorphism.

So, if we have these abstract notions,

for instance, category and Cartesian,

they have a vocabulary,

identity compose, extract left to right, fork.

What we're saying is that D is

compositional over that structure

or distributes over that structure.

It's a homomorphism over that structure.

It's also called a functor, Cartesian functor.

>> [inaudible] is this a equals to a,

plus a linear a

which is your definition of identity of the merchandise.

>> Yes. So, this is the regular identity function,

that's the differentiable identity function which

means it knows how to compute

a value and it's derivative.

>> Yes.

>> Yes, and so, on all of them are like that.

This is the general game,

we'll play this game over and over,

so it's it's important to

pause and appreciate what's going on.

Simple specification, very regular,

a simple definition of a function

and a specification that's always of this form,

that this kind of simple compelling definition is,

that distributes over the structure or preserves

the structure or the functor in category language.

So, if we take these five equations and we solve them,

it's an algebra problem.

So, we solve these five equations we get this result.

Now, how to get this real?

It's in a paper called

The Simple Essence of Automatic Differentiation.

It really just follows from the chain rule,

the fork rule which I haven't shown you,

and the linearity,

the property of linear functions

that are learned derivatives everywhere,

and then a few special things,

we know the derivative of negation, addition and so on.

So, solve those equations you get this form.

I'm just going to show you what this looks

like operationally.

So, for instance, this

is the one that says if f is a linear function,

the differentiable form D

is a function that takes a and gives you f of a and then

a. I could have written that differently as

just f fork constants where constants is the thing that,

whatever argument you give it, it gives you f back.

So, for instance, if you look at how composition works,

there's really two things in here because the meaning of

the specification D hat is we're going to have

the original function and

then simultaneously compute it's derivative,

so that's exactly what we do here.

So, we compose a differentiable versions of

g and f. We can differentiable function at a.

Well, what do we do at a?

We take f of a and we get its value b and the derivative.

Remember these guys g computes a function n and its,

sorry f and g. Then,

we take a and put it into g,

and we get c. That's going to be

the final result of the function

but we also get a derivative there.

Here we are using the composition to the functions,

the result, and then we compose

the derivatives, and this is the chain rule.

So, I did it, I just took the chain rule,

the definition, did in-lining,

wove the code together,

and then fork is quite

similar but there are no data dependencies.

So, for the two derivatives and

values are computed completely

independently, and then combined.

Then, for the basic operations,

negation and addition are linear,

so we can just use the general linear,

so it's identity, so we did the extract left and right,

that's why these ones were simple.

Multiplication is not linear,

so there's little more here.

So, multiplication is, well we're

going to take multiplication itself than

operation but we also need it's derivative.

So, the multiplication function at the argument a and b,

so by multiplying a by b, well,

it's going to give you a linear map,

and that linear map is going to

take a Delta a and Delta b,

and then it's going to scale b by

Delta a and a by Delta b,

and add them, and this is what's called the Leibniz rule,

rule for products differentiation.

So, that's really, that's

it, that's automatic differentiation.

>> When you get through your team a

and b wrong, define this.

>> So, think about linear maps,

there are different ways to think about linear maps,

so that's kind of the heart of the star, we'll get to.

A simple way to think about linear maps

is they're just functions that happen to be linear.

So, the derivative at

a point ab is going to give

you a function that is linear.

That function is what's a function from Delta a, Delta b.

So, this whole thing is the linear map.

If you look, yes, it's linear in Da and Db.

>> Yes, I am

thinking how you would normally represent this.

>> Yes.

>> Like they will be accepted, basically,

filling in all the possible values in there, yes.

>> Yes, so there are different ways

to represent linear maps

and we're going to talk about them.

There's some kind of traditional ways and

some non-traditional ways that work really well.

>> So, you gave instantiations

that satisfy the equations for the previous line?

>> So, what you're asking about this?

>> So, these instances that satisfy the equation?

>> Exactly. So, what

was provided here was these definitions here,

these definitions with organizing essences,

they satisfy this specification,

they are a solution,to these equations, yes.

>> But if you have

polymorphic functions that somehow distortions are weak.

>> Oh, you're asking about are there

some maybe some deep type

theoretic or directional properties.

>> Theories to say, okay, there's

only one function that's-

>> So, the remark is sometimes,

if we're in a pure functional programming in

strongly typed, but more than that,

in a parametrically polymorphically typed the setting,

there are certain inferences you can make that are

valid that depend only on the type information,

and this inference is one of them because these have

been automatically determined just based on the types.

I don't know. They are

not fully polymorphic because

they require vector space structure,

and so, this property is called parametricity,

which a deep property.

It's not as evidently applicable,

but it doesn't mean it's not an applicable.

>> So, it doesn't apply to the lollipop?

Is that what you're saying?

>> It doesn't apply to what?

>> The lollipop.

>> Oh, yes.

Also the Lollipop or the linear map types

that's on a fully general construction of wires notion of

vector spaces or at least modules,

let's call this a generalization.

So, sometimes in the presence

of these certain algebraic constraints,

they are still interesting free theorems.

I have no idea whether there are free theorems

that you know that relate to automatic differentiation.

Even to the extent that you're suggesting,

which is maybe this whole thing is a free theorem.

That would be very cool, and I really don't know.

There's something that's to me very

compelling and inevitable about this development,

and that may be a clue that

there's actually a free theorem.

It's maybe easier in a sense then,

then I think it is simpler.

All right, now I want to show you some examples,

and then we're going to get into

thinking about the performance characteristics

of this simple algorithm,

and how we can improve them.

So, I started out with

these three simple running examples.

This is Haskell code.

I'll talk you through it. If I

don't explain something that

you'd like explained, please ask.

So, here is a squaring function.

So, we're saying this squaring function,

square of a is a times a,

just defining this function.

Well, it's type is, it goes from some type a to a.

Sorry, this is kind of common pun we do in

Haskell programming we name

the variables in the types of the same things.

That may be a little confusing, and

a just has to be numeric for this to make sense.

So, it could be ints and floats and doubles

and other things complex numbers,

and maybe higher-dimensional things,

even functions and so on.

All right, but you can just think of this as being

a real number, a here.

A little bit more of

a sophisticated function, thus magnitude squared.

So, if you have a pair, a cross a,

where this is pairing, magnitude squared is,

it's just that the sum of the squares of

the components. So what do we do?

We take the pair a, b and we

square a and square b and add them.

This is a little bit less trivial example,

and here's another one, which is

the cosine and sine of a product of two numbers.

So, given x, y we're going to compute the sine and

the cosine of z, and what is z?

It's the product of x and y.

Okay? Now, why B3 examples because they

get to scalar versus

non scalar in different interesting ways.

This is a purely scalar function.

This one takes a non scalar, has non scalar domain.

This one has a non-scalar domain and codomain,

both, okay? All right.

Now, we can take these expressions, this is simple.

It's a pure function program,

and rewrite them in categorical vocabulary.

Okay, well, this is a big step

and it's kind of important step.

You may or may not be familiar with

this that there's a deep,

deep connection among foundations of mathematics.

Particularly, this approach to category theory on

the one hand of logic,

and not just one logic, but logics in general.

So, for one reason,

I guess another way to think of it and computation.

Okay? That at the heart of a computation,

reason in the foundation of

mathematics are all the same thing.

Okay? But that connection particularly between

computation and this foundational approach

mathematics category theory is,

well, it's not so obvious in mainstream languages.

So, if you have an imperative language like

C-sharp or Java or I don't know,

JavaScript, or worse, it doesn't even

have static typing, that kind of thing.

Well, this connection to the essence

of mathematics and category theory in

particular is quite obscured.

It's there, it's just hard to see and all the noise.

But if you take a very elegant, pure, simple,

rigorous notion of programming,

namely the typed Lambda calculus,

then there's a very clear connection between the two.

Okay? So, I'm not saying that only functional programming

corresponds to logic and category theory,

I'm just saying that that in that setting,

it's most clearly evident, okay?

This functional programming is equally expressive,

do imperative programming

object-oriented, whatever flavor you want.

Okay? But this connection is

quite simple in this setting of

a typed Lambda calculus and this

is just the sugar ring of the typed Lambda calculus.

Haskell is just typed Lambda calculus sugared.

That's a little more than that and it's essentially that.

So, there's a, as a result from 1980 by Joachim Lambek,

which says that if you take the typed Lambda calculus,

there are different models of it,

there are different interpretations of it.

Okay? So, the one that I'm used to it because I've been

doing computer programming for a while

is computable functions.

So, I think this code,

Lambda calculus it means computable functions.

But what Lambek showed in 1980 is,

now, that's just one interpretation.

That there are a lot of interpretations

that are consistent

with the laws of Lambda calculus,

and they're exactly the inhabitants,

they're exactly the models that

have a mathematical property

namely they are the Cartesian closed categories.

Okay? So, I haven't said what closed means,

closed means you have first-class functions.

Okay? Categories means you have identity and composition,

Cartesian means you have got

the first second projections,

the left are projections, and this fork operation.

Closed means you have a notion of first-class functions.

Anything that has those,

that's it satisfies those interfaces,

and I'm only showing you two of them look like,

and it satisfy the laws,

every one of those things is

a consistent model of the typed Lambda calculus.

So, the way you can think of that is,

every one of those is

a valid sensible interpretation

of a purely functional language like Haskell.

Moreover, we know how to automatically convert

the Lambda calculus into this language.

So, that's the idea

here is that we're going to write in Haskell,

we're gonna automatically, so not the programmer,

the compiler automatically translates to this vocabulary.

Then, instead of interpreting

this vocabulary in the usual sense,

computable functions, I want to

interpret in some unusual sense.

In our case, it's the differentiable functions.

That's the game.

>> If you say all transformation

from Haskell to category three,

and this is that you're inline

the square function in there,

do you need to do that in

Haskell log transformation and

break modular boundaries or?

>> No thanks. So, the question is, well, first of all,

it's an observation that home I translate a square here,

but then I didn't just use square,

which I could have now this is

purely just where the implementation is at this point.

It's a little awkward at this point for me

to preserve that modularity.

But there's nothing.

>> There's no deep reason.

>> There's nothing, yes.

So, at some step, this would

look like multiplication composed.

Sorry, it will be square composed with exl,

and that would be square composed with the exl.

That'll be more readable and more modular, yes.

Just not quite there yet.

All right, so for squaring, what do we do is, well,

we take the identity function and itself together,

work them together, that gives you function

that you give it a and it gives you a and itself.

Just duplicates, so that's a duplicating function.

Then, compose that with multiplication.

So, you take a value, duplicate it,

and then you take the product of

that pair, that squaring.

Okay, what do we do here well,

we take the left extractor and

itself and that gives us x out of x y,

and then we take it fork with itself we get x x, right?

We multiply that so that's x squared,

this is y squared, we do

those in parallel and then we add the result.

We see this is composed,

so you read from right to left.

Cosine sine progress even simpler you take the product,

and then in parallel of the sine, the cosine and sine.

So, what I'm saying is,

we already know how to take

derivatives if we write our programs and this vocabulary,

and that's what the previous slideshows.

So, if we take these definitions

of the operations like add,

compose, fork, and so on,

and we use those interpretations here,

we'll get versions of

these functions that compute not on the regular values,

but their derivatives exactly, all right.

But, this is a terrible language

for person to have to write in, So,

one of the language for a compiler,

or mathematician to reason about but it's

a terrible language for a programmer to write.

Okay, so fortunately we have

this automatic translation from

a kind of human-friendly language,

maybe debatable to mathematically

austere and easy to reason about language.

So, now I want to show you some examples of

this interpretation applied

for automatic differentiation.

So, here again is the magnitude squared

function, sum of the squares,

and here's it's categorical translation,

I just mean this is exactly the same thing but rephrased,

and this is automatic, this step.

Now, if we interpret,

what have we done here?

Oh yes, so this isn't differentiation,

this is just the function itself.

So, this is what the category expression looks like,

this function we can visualize in terms of a graph,

and this is what it looks like.

So, this kind of emphasizes the parallel nature

of fork here and that's this fork right there.

Okay, so what do we got?

Here's the extract left,

and it's getting fed in,

both of them to multiply, that's this one.

Also, the right extractors get fed

into another multiply, these two in parallel,

their results, they take the same argument and

their results get fed in to addition.

So, what do I want to say here?

Oh yes, all right now in fact

there are three different categories going on here,

okay one category is the one

that we just apply unconsciously,

at least I do which is just

to think that this means functions,

plano-mathematical functions are computable functions,

that's the usual one, we have in programming.

This display here, this categorical expression,

expression in terms of this funny category language,

it's another category itself.

So, I have a category that it then

interprets composition as well in a syntactic way,

multiplication makes all the operations

syntactic and then it pretty prints the result,

okay that's just for my convenience.

Another category, it builds a graph,

okay and then I display that graph here.

Then another category is the derivative one,

the differentiable one which I'm going to show you next.

Okay, but important thing I want to say here is,

automatic differentiation is almost

always talked about in terms of graphs,

especially in Machine Learning,

especially in reverse mode.

I think that's always a mistake.

Okay, because differentiation has

nothing to do with graphs,

unlike differentiation by differentiation,

differentiation is about functions,

there's no graphs there.

So, you can think about any domain of interests,

anytime you're doing programming, and you've got well,

there's some compositional thing

going on, some algebraic,

something rather very expressive,

which of course want we want our APIs to be expressive,

it means you use them multiple times,

do you feed the output of one operation

the input of another thing.

Every API you can think of as being about graphs,

arithmetic is an API, okay?

Addition multiplication and just like literal numbers

is an API that you learned as a young child,

and you can think of it as being

about graphs, but it's not.

No. Well, unless you're

really didn't graph theory or something like that,

it's not about graphs,

it's about its domain,

that domain has a vocabulary.

It hopefully has a type system that

mediates what gets composed with what.

You can think of those things as

graphs, but you shouldn't,

a compiler writer thinks about them as graphs.

A graph theorist maybe

might want to think about them as graphs,

but differentiation has nothing to do with graphs.

So, I want to make

that point because it distinguishes his work

from almost all other AD work

that I now especially where is known.

But I also want to say you're going see

a lot of graphs and you might be tempted to

think I'm doing something like

these systems TensorFlow or something,

that's all hung up on graphs like no,

these graphs are just about visualization,

they have nothing to do with automatic differentiation.

Even when I show you the result

of MI differentiation as a graph,

the graph came second, not first.

Meaning it's a way for me to show you the result,

not a way to compute the derivative.

Okay, so here's an AD example,

I'll make a little simpler this is the squaring function,

this is the expression

in this category language and this is a graph that

just shows you what

the computation looks like, all right?

Now, what I get here is this is

the derivative of the differentiable function version,

so if I take this expression and interpret it in

this category

of differentiable functions that I showed you.

Then I take the resulting function

on the inside the representation of it,

and I turn it into a graph,

I reinterpret it as a graph, this is the result we get.

So, this is the original function, so that's a,

that's a squared, a times a,

and that's the first result.

The second result is going to be

a function that's linear,

and the way I'm showing functions

here as this little green box,

and the input is, in

this input and out is the output of the function.

So, what do we have here? This is a,

this is the linear function that takes,

I'm going to call it da, okay It's just a little Delta.

So, we get a times da times 2, all right.

So, this is the linear map that maps da to 2ada.

You're used to thinking of that as the derivative is 2a,

the derivative a square root is 2a.

It's not, it's da times 2a.

Okay, that's a linear function,

it's a function from da, that multiplies da by 2a.

When we say the derivative of a

squared is 2a, that's what we mean,

it means multiply by 2a because it's linear.

Okay, here's one more sophisticated example,

the sum of two squares,

okay this is the categorical

expression and when we interpret

these operations in

the differentiable function category, we get this result.

Again, we're going to compute the original function here,

so this is what ab,

so that's a squared, b squared, and we're adding them.

That's the first result, and

the second one is the derivative.

So, it's going to take a da db,

and here we get da times b,

that's b, that says a times db, this is what?

This is b times b,

times db, that's not right.

Okay, so we get two,

Oh yes! Of course, that's exactly right.

Yes, so never mind when I say,

yes this is correct my brain was incorrect.

This is a da times a,

doubled, those are two ada,

this is 2bdb, we're going to add them.

That's the linear map, right.

>> So, there is no real magic here,

right? Find the perimeters.

What made you like, cosines specials include this once,

and have so that you just compose the-

>> Exactly.

>> The d guys and I said what.

>> Exactly.

>> Right, Yes, there's no magic here.

>> No.

>> Yes.

>> Exactly.

>> Except for derivatives.

>> Yes.

>> They used to fit in.

>> Yes.

>> First of all its composition.

>> Yes, so we have to know derivatives our primitives,

and then every composing form,

in which there are two main ones,

seem much longer than a parallel.,

know how we compose

these differentiable functions, and that's it.

That's all there is done like differentiation.

Okay. Yes, thank you,

I want to emphasize the simplicity of

what's going on here particularly in

contrast to the complexity of what goes on in

the literature about automatic differentiation,

particularly what's called reverse mode,

which is the mode that's needed for machine learning and

other uses of high dimensional optimization problems.

>> A question.

>> Yes.

>> How would you deal with sums,

and I'm thinking about like discontinuities and so on.

>> Yes, so some types.

>> Yes.

>> Yes, so the question is, so I've shown you how to-

what- we can deal scalars,

and we can deal with products.

So, what about when we have

sometimes of disjoint unions that kind of thing.

There's a few angles on the question,

but the first and most fundamental question

is differentiation even meaningful,

because once you once you have some types,

you no longer can have purely continuous functions,

well, you can but none of them are interesting, right?

They would have to always be the left or

the right disjunct of the sum. all right.

So, I don't know,

I'm going to show you something that is similar

but interestingly different from sums,

but your question yes it's really about sums.

Is something I've thought about,

I think it's probably a really lovely answer,

and I don't know what it is.

>> I suppose in some way you can encode

some and fit inside your system what does numbers.

>> Yes.

>> What does it look like at the end of the day.

>> Yes, so you can church encode it,

though you can represent,

so some sometimes is like disjoint unions,

it's either that or that plus a tag,

something that tells you which it is.

So, even when you add like wo or b,

you'll still get to know which real,

the left one or the right one,

that's why we call disjoint union.

I forgot what your remark was.

>> What the are few- I mean church encode aside.

>> Oh yes, church encode. Yes, so

these types can be encoded in terms of functions,

actually even pairs can be encoded in terms of functions,

but actually that's not enough,

you need universals as well,

universal quantification, yes.

What the universal quantification then n function,

then you can get pairs and sums,

but I don't know like if you encode it,

even products before we get to

sums encode products in that way,

what happens to the notion of differentiability,

because they'll just ride along in

a sensible way with the church encoding,

that's a really interesting question.

If the answer is yes,

then it will be encouraging to try

the same thing with sums.

>> It might get to the same round of [inaudible].

>> Yes, yea, it might.

>> Yes, these are interesting questions.

>> The questions is always busy,

that the church encoding is an instance of continuation.

>> Yes, okay, and we'll get into that.

>> So, I'm sure we'll get continuations.

>> Yes, we'll get continuations. All right.

>> For church encoding you need rank two types-

>> Yes.

>> Which corresponds to cones in category

were they to extend here in your theories?

>> Yes absolutely, yes.

What I love about this framework if you think about

automatic differentiation and not

just that but a lot of different problems,

a lot of people do, prompts if we can reexamining

in terms of this the language

of the seventh category theory.

What I led us at the space locations are so simple.

All right?

So if I was trying to start from back propagation,

as it's described in the Machine Learning

literature or the AD literate.

I wouldn't have a prayer,

I don't even understand what this algorithm,

I don't understand its meaning

or whether it's correct or not,

without introducing these complications.

But, when we get it to this really simple foundation,

then we can revisit these questions.

Yes, what if you do charge encoding,

what if you do sums,

what if you aren't happy with the performance.

You get out of here and you want to reason.

>> So, your objective is you given the circuits,

an arithmetical circuit, you want to differentiate it?

>> I'm given a function on a circuit.

>> Okay.

>> But there was somebody say.

>> You don't want to listen here.

>> Okay, yes.

>> You're given the representation

of an hypothetical function.

>> Yes, you're right.

>> Your objective is to compile it,

compile the derivative and optimize it.

>> Yes.

>> That's the main goal.

>> Yes.

>> It's derivative.

>> Repeat the question, it is my goal

here to take a description of mathematical function,

and generate code that computes its

derivative correctly and efficiently,

is that the question, right?

>> Yes.

>> The answer is yes.

Yes that's exactly my goal.

Now, it is important distinction,

we're given a representation of a mathematical function,

given a mathematical function

itself, it's not computable.

We can't compute its derivative, but

given a representation yes.

So, there are different representations you might choose.

A popular one is graphs.

What I'm saying is that's always a bad idea

compared with this much simpler setting of saying,

"Well, we're going to take the Lambda calculus."

Moreover, that's just a starting point,

what's really much more convenient

is this categorical expression,

of which we can automatically translate.

That step has nothing to do with differentiation.

>> There is also this type of symbolic differentiation.

>> Yes.

>> Is it different? I mean I don't understand variable,

the numerical Fortran programs.

>> Yes.

>> But it seems the high-level goal seems identical.

>> Yes, it is.

>> Yes.

>> So, if you read the AD literature,

maybe some Machine Learning, AD literature.

A lot of paper start out by saying there are

three different ways to compute

derivatives on a computer.

One is as they call numeric or approximation,

that there is called the secant

method or method of differences,

those are all different terms to the same thing,

and it's just you take a couple of points,

hopefully near each other,

you draw a line,

and you say that's roughly the derivative.

Okay, well it's incorrect.

>> Isn't that a miracle to it.

>> That's yes, but

America's people will call it

in America that's not a good description,

because automatic differentiation in numeric too.

If you look at it one way it's numeric,

if you've got another way it's symbolic.

So, just call the first

one numeric is a little misleading.

>> Okay.

>> Yes, because you can set difference between AD

and automatic and symbolic differentiations,

AD is numeric and symbolic is symbolic.

But that's a funny perspective,

I think there's a quite popular perspective,

but I don't think it's a very good.

>> Well.

>> Well because-

>> Also, seems all symbolic you

just working with symbols, right?

>> Yes, I think so, it's one of

these things that I would agree with you with

except that what you're saying is true, but misleading.

>> Okay.

>> In the following sense,

so it's made factual but it's

not true in the following sense,

everything that is done by a compiler is symbolic.

>> Sure.

>> Okay, so to talk

about something and say it's symbolic,

its not, if you're in a compiler, everything is symbolic.

Once you're doing runtime,

you can say everything's numeric.

Well, okay if your program is doing

a symbol manipulation like symbolic differentiation,

then okay, it's not

a very useful way to think of it as it's numeric,

of course it is because it's

using bit and coding of things.

But that's not very helpful perspective, yes exactly.

So, everything is symbolic when done by compiler.

Automatic differentiation, the story is,

there are three ways to compute derivatives,

one is incorrectly by numerical approximation,

that's called the secant method or

method of differences, okay?

That's very popular, but it's

terrible performance-wise and accuracy-wise.

Another is what's called symbolic differentiation,

which is you write code

that does what you did in high school,

which is like manipulate symbols and we'll see

that the derivative of sine u is cosine udu,

and so you're writing something that is manipulating

some representations of mathematical expressions, okay?

So, that's typically called symbolic

and there's automatic differentiation,

okay, which is what I'm showing you,

which uses the chain rule and so

on.But from my perspective,

I don't think that's a very helpful description.

I don't think it's true.

>> Because it's higher level, you're saying it's

more mathematical, that's why numeric.

>> What I'm saying is that-.

>> It has algebraic rules.

>> Yes.

>> It was suppose to organize when you

reorganize code in a compiler.

>> What I'm really saying is this,

automatic differentiation is symbolic differentiation

performed by a compiler.

That's my perspective on

the relationship between the two.

So you won't find that statement in the literature,

you'll find these two at

very different things, unlike differentiation.

Look at one symbolic

differentiates all kinds of problems.

I think what is true is symbolic differentiation

done badly has all kinds of problems,

but that's nothing to do with symbolic differentiation,

that has to do with doing things badly,

all right? So, don't do it badly.

>> Symbol.

>> In the context of a compiler,

it's convenient to do it well.

So, I'm doing symbolic differentiation,

I'm tricking a compiler into doing

symbolic differentiation.

No big deal because

compiler does everything symbolically.

I wanted to have good performance,

no big deal because compilers are good

at maintaining performance, okay.

>> Can I say that one difference

with usually use symbolic differentiation

that may not work, right?

You may write expressions that

cannot be enabled by the code,

in your case I guarantee that if I

write it within this limits and I print out, it's true.

>> Yes.

>> So, now [inaudible] yes.

>> It's factual and misleading yes,

so if you do

things symbolically, you can run into problems.

If you do it, here you can't run a problem,

now that both you can run it

the same problems in either

setting which is if you try to

express something outside of

the vocabulary of which for which we know how to

symbolically or automatically

differentiate, then we're going to run the problems.

So, maybe in one setting,

it's a little more commune

to respect your expressiveness,

maybe, I don't really know if it is.

Okey-dokey, so here's the third example,

the sine and cosine of the product, okay?

This is the undifferentiated version,

there's a differentiated version.

Okay, and what I want to show here is that there's a lot

of sharing of computation between the regular value,

which is sometimes called the primal and the derivative.

Okay, well this is a small example,

so there can't be a whole lot of sharing because

there isn't a whole lot of computation going on.

But this is crucial

and I forgot to mention this at first,

this one-step early on in

the talk when I said sequential composition,

the chain rule is not "compositional,

" ironically, centered about competition.

I made it compositional by

augmenting the function with its

derivative instead of just computing the derivative.

That augmentation not only makes it compositional,

it makes it possible to do computed

efficiently and that's because

the original definition of just the derivative

uses the derivatives of functions,

and the regular value of functions and usually there's

a lot of shared computation

between a function and its derivative.

So, by putting them both into one specification,

it makes it possible to optimize and share computation.

Okay? Not only possible it is quite easy because I

showed you the code that shares, okay?

With some what's called symbolic differentiation,

sharing is a big deal and

the main criticism is symbolic differentiation is that,

well you don't get sharing,

you end up doing like exponentially large

work to compute or

exponentially large work to optimize

away the redundant work at runtime,

so usually do a huge amount of work at

compile time or runtime.

All right, that's just

because people are used to

doing symbolic differentiation badly,

that's all, it's not

a problem with the symbolic approach and as I said,

this is symbolic but the compiler's doing it.

I don't even have to help the compiler

be smart about sharing because

compiler is smart about sharing,

that's a part of their job.

>> That really helps to create tree into a diag.

>> Yes, the compilers do that.

When you say let, that's not

the remark wants to turn tree into a diag,

what I would say is stop thinking about graphs, okay.

So, if you think about graphs,

then you're going to run it this thing like, "Oh my God,

it's easier to manipulate trees

compositionally than graphs," graphs are very awkward,

trees are super simple, it's like the simplest thing.

So, oops! But some trees are

exponentially larger than the graphs they represent.

In other words, trees don't have sharing,

you have to do duplication to represent the same thing.

So, you could think in terms of graphs, trees,

manipulation of graphs that refactor big trees

into into graphs with

more sharing that's a lot

of work and it's really complicated.

Just don't create the problem in the first place,

don't think about trees and graphs because

differentiation has nothing to do with trees and graphs.

Okay. So I go back to

the Haskell code that

is derived by solving these five equations,

the structure preserving nature

of the d-hat operation, okay?

So, this is the code I showed you a few slides ago,

and I want to note

that there is a something that jumped out at me,

when I first was working on this differentiation.

In this category language,

I started noticing a pattern,

and when I start nosing patterns,

I like to write pretty code.

So, I tried to pull

those patterns outside so could

really see what was going on.

I noticed the first one that really caught my eye is

the composition of differentiable functions involves

the composition of derivatives, okay?

The identity differentiable function involves

the identity linear map.

So, I got the differential

identity function involves the identity linear map,

is the derivative everywhere.

The composition of differentiable function

involves the composition of

the derivatives of linear maps.

So, not only that, but that's all it needed.

These definitions use nothing about linear maps

other than that they have identity and composition,

and that they satisfy the laws.

So, that was interesting.

Then get into Cartesian, oh, Interesting.

The left extractor, involves

the left extractor for the linear map.

The right extractor and the fork involves

the fork and that's all that it used.

So, this code is not only pretty and poetic,

which is really how I got to it,

but it says something very powerful,

which is that linear maps,

there's almost nothing you need to know about them.

You need to know that they form

a Cartesian category also.

That means that every notion

of every function like thing,

that has a Cartesian category,

and there's a lot of them,

can play the role of linear maps.

So, this observation and writing

the code purely aesthetic motivations,

it led to I think

a deep and very powerful insight, which is that,

automatic differentiation not only

can be expressed elegantly,

but much much more

generally than the original specification.

So, that's the point here.

Every one of these operations on

differentiable functions just used as

a corresponding operation on linear maps

and therefore, we can generalize,

we place the role of linear maps by

arbitrary other thing that has identity compensation,

projections and fork, and satisfy the laws.

So, all I've done from this slide to this slide,

is I've parameterized this notion.

So, D here, it had linear maps built in.

This D, the subscript says,

it's going to take a parameter,

it's going to take any old Cartesian category,

for instance linear maps, and use them here.

The definitions are unchanged.

Well, I made one little change.

I'm making a clear distinction

between linear D. It takes a function and its derivative,

they're always going to be the same.

Like identity and identity XL and XL.

But one will be on functions and the other will

be on the linear map like thing.

So, the code really is unchanged,

except for this duplication,

just to make that distinction clear.

Well, there's one awkward thing here,

the gate and add are linear.

So, we're going to take just the linear versions

of them. So here we are.

See, this is the extract left on functions, this is on,

I want to say a linear maps,

but it's not just linear maps,

It's arbitrary, this notion here.

But multiplication, that's awkward.

I don't really have a way to talk about it.

Because originally, I wrote it this way.

But this is functions.

So this linear map,

this is literally a function.

So, it's not a kind of an arbitrary Cartesian category.

Well, so, let's write this in another way.

I'm going to suggest a couple of pieces of vocabulary.

Next, tell you what they mean on functions

and then say we can generalize them also.

Then we get another expression

of a derivative of multiplication.

So, there are two things going on.

One is scaling and

the other is where we take a pair and we break it up.

We do some computations and we add the results.

Those are the two patterns that I want.

So, one of them is called scale.

So, scale of u,

is the linear function that multiplies its argument by u.

So, multiplication isn't linear.

That's what's called bi-linear.

But when you provide one argument to multiplication,

the result is linear.

Then there's this operation, I'm calling it join.

So, remember fork, it took

two functions or two arrows of the same domain,

and gave us and took an argument given to both,

and then paired up the results.

Join is the dual of that.

Sorry. Yes, that's right.

So, with Join, we have two of

these maps with the same co-domain.

The same result and possibly different arguments.

So, we're going to take a pair

rather than generating a pair.

So, it's just a dual.

A reverse of a fork. What do we do?

Well, when you do join f and g, you get a pair.

You parcel out the arguments,

give one to f, one to g, and add the results.

So, that I'm calling join.

With this vocabulary, we can just

refactor this expression and now we get this form.

lambda ab, we're going to join

scale by b, with scale by a.

So, why is b first? Because it's b times da, a times db.

So, now with this vocabulary,

I haven't really made this anymore general.

I've just described it differently but

I've introduced new vocabulary that does generalize.

So, here's a new vocabulary.

We have the category in the Cartesian forms,

and now we have this co-Cartesian.

It gives three operations that are the dual,

the reversal of the Cartesian operations.

So, instead of having two projections or extractors,

we have two injections.

Left and right. Instead of

taking two arrows with the same domain,

we take two with same co-domain.

So, this is the new vocabulary.

Then one more thing, settings

that know how to do scaling. All right.

So now, given this vocabulary,

now we can write linear transformations as functions.

So, all I'm doing is I'm repackaging when I said before.

So, this is a little new type.

This is just saying, we can going to represent

these additive functions,

as just regular old functions,

but with the context of additivity,

and the all the types involve are additive.

Now, where that additivity comes in,

I'm omitting from this slide, but it's in the paper.

The interesting new thing here,

is that we have this co-Cartesian operation.

So, to inject from the left,

we add a zero on the right.

To inject on the right, we add a zero on

the left and the join operation does the addition.

Parceling out as an addition,

and scaling just as multiplication.

So, with this definitions,

now we get back to where we were before.

We have essentially using

functions to represent linear maps,

but now we have this more elegant vocabulary.

Now, we can do something more interesting.

Now, we can change our representation.

Instead of using functions as linear maps to represent

linear we can use other representations.

Why would we want to?

What's wrong with using these these functions?

It's the simplest way to do it but has

a problem and here's the main one.

It's that sometimes we need to

extract a data kind

of representation from the linear map.

So, if we get a linear map,

it's represented by a function.

I might need to do something like, well,

a couple of examples are,

one is, if you're doing computer graphics,

and you're doing a lighting and shading,

so you got some 3D model and

the light source, surface properties,

and so on, to light a surface at a point,

we need to know the normal vector.

So you got the surface,

and you can think of the tangent plane or normal vector,

and we're going to take a dot product within

a normal vector and the vector between my eye,

and that point or the light and the point, I guess.

So, how do we get the normal vector?

Well, we take the cross product of

the two partial derivatives.

So for that, we need a data

representation, we need vectors.

How do we get from a function that we know is linear,

to a compact data representation?

A vector for it?

Well, there's a simple way to do it.

Which is you take the function,

as long as it's linear, you can sample it on a basis.

The results you get are the vector.

So, you take function or the matrix.

Take the function that if you know is linear,

sample it on a basis.

What's a basis? For instance,

take the identity matrix, all right?

So, the rows or columns you think about it,

the identity matrix, that forms the basis.

So, you apply the function to each of those vectors.

Then you accumulate the results,

that is the Jacobian matrix.

The data representation.

But it's terribly expensive.

In some settings, if the domain of that function,

so that's what we're on a basis of, is high-dimensional,

say it's a million and that's not a typical,

that's quite typical for machine learning.

So, it's in the millions.

Then I'm going to make a million by

a million identity matrix.

I got a million, what's called one-hot vectors,

which is zero everywhere and a one seller

and pump them through this function.

It's extremely expensive to renew quadratic work.

So, this is not a viable approach.

This is why we wouldn't want to represent.

If we're doing optimization problems,

particularly if we have high dimensional domains,

we would not want to use functions or

representation because it's just

too expensive to extract representation.

Another motivation besides computer graphics,

is just search, is

machine learning, deep learning in particular.

Want to do gradient descent, what do we do we?

We need a gradient vector.

Because that's what tells us in what direction to

adjust the input or searching for

a local minimum or maximum.

So, we could for an n-dimensional domain make n passes,

each on a one-hot vector,

very expensive and particularly

expensive for gradient-based optimization,

which is like the most

valuable application of these ideas.

So, ironically, we have a very low dimensional co-domain.

The result, it's one-dimensional

if we're doing optimization.

We have a high function that takes

some high dimensional input and gives

you an objective function value.

One. Unfortunately, the cost of

converting the function representation depends on

the dimension of the domain.

Relies on the domain being small.

But the domain is large, the co-domain is small.

So, this is a serious problem.

So, there's a solution which is

use matrices in the first place.

Don't use functions to represent

linear maps. Just use matrices.

That's what everybody really does. All right?

Well, in a nice typesetting,

there's lovely and general way to think about that,

which is that something in the matrixes,

like row columns, is two-dimensional rectangular thing.

Think of it as a structure of structures.

It's got a row structure and a column structure.

You know structure of structures and those structures are

dependent on the domain type and the co-domain type.

Okay, so that's what's going on here.

I'm not going to go into this in detail.

But if you define a notion of a matrix like thing,

in a more general setting,

and you'd find what it means to apply one of

the matrix to a vector,

and also multiply the matrix by the vector.

That gives us an algebraic problem,

two of the same form, these five equations.

Say that this apply thing preserves structure.

We can solve those equations

and what comes out of those equation,

those are matrix multiplication and so on.

Matrix multiplication is

a solution to an algebra problem.

I think I will skip over this part.

This is just about these three Ps of vocabulary.

Scale, fork, and join,

that is the vocabulary of matrices.

If you think about matrices,

every matrix is either a one by one,

which means it's scaling

or it's a horizontal

or vertical juxtaposition of matrices.

That's not how I was taught to think

about matrices but I think is a very nice way.

If you have those three building blocks, one by ones,

horizontal juxtaposition

of matrices with the same height,

vertical juxtaposition of matrices with the same width.

All matrices, those operations are

exactly the scale, fork, and join.

The types guarantee that when you stack,

you can always stack matrices with the same width.

Or if you juxtapose them

horizontally, they have to be the same height,

that's in the types. Where does that come in?

Join requires that the co-domains are the same.

Requires the domains are the same.

Now.

So, at this point we're talking about matrices.

I would say, we tried functions,

it works out very nicely.

But, for high-dimensional domains,

it's too expensive to extract the data representations.

So, why don't we use matrices in the first place.

So, we can do that, but now

an interesting question arises,

which is if a lot of matrix multiplication happens,

everywhere we said composition

that gets implemented by matrix multiplication.

Matrix multiplication is associative.

Why is matrix multiplication associative?

It's maybe a funny question,

why is it associative?

You might say, "Well,

here's a proof that it's associative."

But, I want to say,

"No, that's the hard way."

Matrix multiplication is associative

because function composition is

associative and matrix multiplication

correctly implements function composition,

when the functions are linear.

That's a much more direct way to think

about compositionality and about matrix multiplication.

Matrix multiplication is solution of a problem,

which is how do you implement correctly,

composition of functions that are

represented by matrices, and then,

answer to that or the problem is matrix multiplication,

the associativity follows from

the original motivation for making

it the first place, not after the fact.

So, matrix multiplication is associative,

but different ways of associate give you

different performance, very different performance.

So, that means if we're going to use matrices,

we want to be clever about how to associate.

The definitions that I showed you are not

at all biased to how

to do the compositions of linear maps.

Whatever composition structure you give originally,

it's going to maintain that structure.

Some of those composition will be

efficient and other ways that you might have,

but didn't compose it, would be a more efficient,

more or less efficient, So,

what we'd like to do, though is not rely on

the programmer to write the compositions

in a way that's efficient for

automatic differentiation because a program

has a different thing he's trying to achieve

which is clarity and modularity.

The compiler should then,

figure out how to make it efficient,

not the programmer, because those two are at odds with

each other efficiency and

modularity are odds with each other.

So, fortunately there's a simple way to,

I don't think it's simple, Thoughtially,

there's a way to take any composition

of matrices and generate

the optimal capital re-associate

into something that's computationally optimal.

While it's not simple, there are couple of algorithms.

One's the n-cubed algorithm that

does dynamic programming and then,

there are these much cleverer,

subtler algorithms to do in log in time.

Where does this smarts going to go?

Here's the cool thing, the cost of

matrix multiplication depends only on

the dimensions of the matrix, not the values.

If you want to be really clever and talk about

special matrices that you in a way,

you can predict, they're going to be

zeros and that's not the case.

Let's say, you're not talking about special matrices.

So, the optimal composition

as multiplication of matrices

depends only on the dimensions.

Their dimensions depends only on

the types in this setting.

So, the dimensions as part of

the types and that's compiled time information,

because it's a statically typed language.

So, we can do the optimal

re-association at compiled time,

not at run-time, that's a huge win.

There are two common associations that are made.

If you associate all the composition to the right,

you get what's called forward mode ND,

you associate on the left what's called reverse mode AD.

For any kind of

gradient-based optimization and

reverse AD is significantly better.

How much better? It depends on the domain of dimension.

For machine learning these days it's huge,

it's several millions.

So, how can we do this optimal reassociation?

Well, as I said, there are these algorithms,

but there's another way, which is,

if we know, if we just assume that the result,

that the final codomain,

the final result type is low-dimensional.

Particularly if it's one-dimensional,

then I think it's the case

and I don't know this for a fact,

I think it's the case that full left association

in the large reverse node is always the best strategy.

I'm not sure that's true,

I am sure that it corresponds

exactly what's in a machine learning,

that's a back propagation does.

So, how can we think about getting

this left association automatically

and there's this nice technique which

is related something it's

called continuation passing style.

So, the idea is the following:

If I have a function like going from a to b,

this can be linear maps or regular functions,

so on, I can represent it by something that says,

"If I know how to consume b,

I can give you a way to consume a."

So, these two are equivalent representations,

you can transform either one to

the other without loss of information.

So, if I have a direct way of going from a to b,

there is an indirect representation that says,

"If I know what I wanted to do with b,

then I would know what I wanted to do with a,

which is apply this thing and then consume the b."

So, given an a, we're going to compute

the b and then consume it.

So, that's the meaning, given a function f,

we kind of interpret f. So, that's f,

we interpret it by as something that composes with f on

the right and some consumer

of the resulting b value on the left.

That's a continuations come.

If you make this transformation,

package it up to the category,

then no matter what composition style

you used in the original expression,

it's going to turn into fully

left associated and the representation.

You need a way to kick it off and that's just

by supplying the identity continuation.

Then, one point I want to make is, I think a very,

very big performance win

here from this perspective, which is,

remember we're going to be computing derivatives

of a function at a point

that's input and when f is higher-dimensional,

say in its domain and co-domain,

on its input and output, then that result,

is going to be something large,

therefore, represented on matrices is going

to be two dimensional, it's going to be

quadratically large. A lot of times, these

internal matrices are quite sparse, it could be

the zero matrix that actually happens fairly frequently.

So, very sparse, sparse means how many elements,

well the density is how many elements are non-zero.

Zero is the sparsest you can get,

there's also quite common is the identity matrix,

well that's also extremely sparse and you think

that's the density of

the square root of the number of elements.

So, these matrices can be very

sparse and then that

means that you're wasting a lot of computation,

if instead of just computing them,

we compute them composed

with a continuation that we're getting in,

then we can bypass

the construction of the matrices entirely,

then we get very small computations.

I'm not sure, but I think this is probably a very large,

when performance wise, it's also very easy to implement.

>> So, Conal, it's only 15 minutes

left. Make room for questioning.

>> Yes, absolutely, yes, thank you.

So, this slide and then a couple of

more is the punchline.

>> Yes.

So, this is a continuation category.

As I say, we've got

some underlying category, totally arrow,

and we've got some ultimate result type r. So,

this r is going to be the ultimate codomain.

We're just going to pack it up

this notion of continuation,

and now we set up another algebra problem,

which is how do we interpret

or say a function here in linear map?

In this category, well, we just do what I said before,

it's just we're going to compose on

the right with the function.

The left is going to be the continuation.

Now, we say cont has to preserve structure.

Meaning those same five equations,

always the same five equations it maps.

It did compositions to compositions and so on.

When we solve this algebra problem

with the help of some isomorphism,

we get this definition.

So, this definition is

a reverse-mode automatic differentiation,

this code is reverse-mode.

It's much, much simpler

than anything I've seen in literature.

Literature talks about graphs and talks about mutations,

partial derivatives, none of that is necessary.

Okay. All that's necessary is

this little continuation trick,

and that trick is annotated on my differentiation.

It's just something that would be in a library.

You're programming with this vocabulary,

we're going to have a library.

It's going to have continuation and it becomes

useful for all kinds of things.

So, this is the reverse-mode AD.

It's a general AD category parameterized by

a continuation transform version of matrices.

>> Triple M subscripts.

>> Yes. So, matrices is indexed by the scalar.

Continuation is indexed by, well,

in this case, matrices and the final result type,

and you plug that into the general AD thing

which is no more complicated than the specialized AD.

So, this is a reverse-mode AD without tears.

Every time I read a paper

on reverse-mode AD, I want to cry.

So I think that's how they came up with this title.

Okay. All right.

Now, this is the final essential trick.

This is what gets backprop.

It's a more specialized version

of reverse-mode AD that's a little more efficient.

Here's the key idea.

If you look at linear transformations

from a vector space to its scalar field.

So this, you might be r the million and s will be r,

so it's the underlying scalar field.

So, in general, you can

have linear transformations between

any two vector spaces with the same scalar field.

But if the result of

the codomain is the scalar field itself, then u,

derivatives from u to that, scale to that,

that's called a dual vector,

and there's this lovely duality

property in linear algebra.

You will see it in terms of matrix transposition.

Matrix transposition is all about

duality. Here's the cool thing.

If you're interested in

linear maps that result in the scalar field,

they're isomorphic to u itself.

If u is finite-dimensional.

Every one of these linear maps

has a particular form which is

the dot product with a vector,

and then it's waiting for its second argument.

Now, here's the cool thing.

If we use that continuation representation.

So, this is the punch line.

We have linear maps from a to b.

We use the continuation transformation

and we get b consumer

to an a consumer, linear consumers.

But these guys are isomorphic to b and a itself.

So, it represent linear maps from a to b in

this continuation form with s,

the scalar field, but we can

represent that continuation form

by just functions from b to a.

Again, there's a very simple specification

that relates the continuation to the dual.

Again, we set up the algebra problem

with five equations and we solve it.

Now, we get this incredibly lovely form.

This gives you the implementation of a dual category.

Now, we can see, wow,

the identity is the identity,

composition is the flip composition,

the extractors are the injectors,

the injectors are the extractors,

fork is join and join is fork,

and scale is itself.

If you think of these operations,

fork joining scale as being matrix building operations,

then what we've done is we've described

that to the transposition of matrices.

So, here's backpropagation.

So, no mutations,

no variables, no graphs, no partial derivatives.

It's just the general AD

where we use the dual of additive functions,

and no matrix compositions here.

It's extremely efficient.

Then, I just have a bunch of examples,

but I think I'll pause and take questions.

If there are questions about examples,

I'll show them and otherwise it's not.

>> Why was the dual of scale to scale?

>> Yes. That's funny.

So, the dual is not the inverse.

For the inverse, if you want

to scale it like the inverse.

So, It's really-

>> I think of the transpose.

>> I can give you a variety of answers.

If you want think about matrices and you think

the transpose are one by one, you get itself.

But I'm not satisfied with that answer because it

leaves the question of why do you use

transposition in the first place? What does it mean?

What does transposition have to do with linear maps?

The duality and

the continuations instructions that I showed

you are exactly what motivates transposition.

So, I can say it's because

transposition preserves one by one matrices,

but that's a backwards answer from my perspective.

Transposition is about duality,

not duality is about transposition.

>> So, I learned about automatic differentiation through

the framework of the commutative ring

of dual numbers over r. Now,

I was wondering if you could relate to my ideas of

your work to the language of dual numbers.

It seems to me that your fork idea of take combining

functions derivative is exactly

appending a dual component to a function.

>> Yes. So, as I said there,

commonly AD is done in

two flavors, forward and reverse-mode.

Forward-mode is fairly easy and

convenient to express in a lot of

programming languages especially if

you have operator overloading.

A way to do that is to everywhere you have numbers,

instead use something that's

sometimes called dual numbers,

which is a pair of a number and a second number.

The second number represents the derivative.

So, it's very much like you see in my original type by

the function that gives you a pair

of number in a linear map.

>> Exactly, yes.

>> Okay. So, yes,

you can explain the dual number idea

in terms of this categorical form.

But of course, if it's a pair of numbers,

then you can only talk about

differentiation of scalar function.

All right. So, it's much, much less general.

It's also wrong the way that forward-mode AD is done.

Well, it's correct only because people are careful.

If you look at forward-mode,

usually is you end

up taking functions from dual numbers to dual numbers.

A value anti-derivative to a new value anti-derivative.

But that transformation of

a number to derivative to a number derivative,

the only some of them are sensible.

Because the value out cannot depend on the derivative in,

and yet the types of it can.

So that to me is a fundamental flaw in this whole idea

of dual numbers and the whole way that a forward-mode AD.

Meaning you have to be careful

that these functions are dual number to dual numbers.

Some of them represent sensible differentiation,

some of them don't.

If you refactor it and say, "No,

the value out cannot depend on the Delta in."

you get my representation.

Which as you take a single number in,

you get out a single number as

the value and another function from a Delta to a Delta.

So, if that representation,

the number at the value out cannot

depend on the derivative in. Yes.

>> Makes sense.

>> So, that's something I didn't appreciate until I asked

myself exactly the same question because I did

this forward mode as well,

and I wondered, "Do these two relate to each other?"

That's when I realized, "Oh, this is

really unfortunate aspect of the forward mode."

>> Let' see.

If there are no other questions, I'd want more.

Could you go back to your slide,

see the axioms for a category,

in a Cartesian category.

Immediately after you say that your implementation of

your D hat factor satisfies this rise.

>> Yes, it does maybe.

>> Yes, okay. So, under the next slide, right?

You have your implementation of D hat, right?

>> Yes.

>> I was wondering from

what axiom on the previous slide

does the Leibniz rule come from?

>> Oh, that's a different.

>> It doesn't. Yes, well,

that's the numerical category.

>> Yes, exactly. Exactly. So, it's

an unfortunate thing about

the way differentiation is taught in calculus classes,

and the way that symbolic differentiation is done,

and now automatic differentiation is done.

There's a really unfortunate thing which is that,

you look at the rules you learned in,

rules I learned anyway in differential calculus

and it said things like,

derivative of a sine is cosine, but it didn't say that.

It said that the derivative of sine u is cosine udu.

All right. Your cosine u is minus sine udu.

So, for times the derivative of u.

So, and then multiplications,

derivative of u times v is the derivative of u,

the derivative of a product uv is the derivative of

u times v plus the derivative of v times u. All right.

Every one of those rules

has the Chain rule embedded in it.

Every one of those rules is

the Chain rule plus another rule.

So, by using this categorical approach we said no,

the derivative of, is in same sine,

derivative of sine u is cosine udu,

derivative of sine is cosine,

plus we have the Chain rule.

So, we don't need to have

these kind of complicated rules,

every one of which has the Chain rule embedded in it.

>> Can you say anything about,

did you exploits these insights,

better circuits or was there [inaudible].

>> Yes, yes. Yes, yes.

So, well, I'll tell you what I do and you can tell

me that if I'm exploiting or what you don't mind.

So, as this plugin I have,

it's a plugin in Haskell Compiler lets call it GFC.

>> All right.

>> I thought there's a GFC, it's a Haskell Compiler.

I wrote a plugin in it that transforms

the internal representation which is

the type Lambda calculus called core.

Transforms the regular Haskell code into

categorical form and then

instantiates it with respect to

whatever category you want,

for instance, differentiable functions.

There are rewrite rules,

most of which are just specified and just

Haskell code and the rewrite rule sub-language.

Those rewrite rules are true of all categories,

if they involve this category language or Cartesian,

if they involve a Cartesian and so on.

So, a lot of optimizations in there that work for

all interpretations, including differentiable functions.

>> Do you have any ideas that you've been taught?

Say, different circuits and compiled it within Haskell.

>> Yes. So, what I actually did.

>> I actually tried it keeping it in circuit but.

>> Yes, well, I don't mind saying circuits.

This whole work was was motivated

by compounding Haskell to hardware.

That was my job a few years ago.

I worked at a hardware company that had

this really cool dynamically reconfigurable hardware.

So, it's like an FPGA but it reconfigured

it two billion times a second that are like once a year.

So, reconfigured as it ran.

So ran it to billion times a second,

two gigahertz, and it

reconfigured itself at two gigahertz.

There's a huge win from doing

this kind of runtime optimization.

So, my job was.

>> Do we see all this addition come in?

>> Originally for things like FPGA

it's useful like running in a router,

in checking your packets for Russian virus or whatever.

>> This would be the reconfiguration.

What would require the reconfiguration

to have in circuits?

>> Reconfiguration in this setting

three-dimensional circuits are much

more efficient than two-dimensional circuits,

can be, I mean.

Pretty much on circuit can be much

more efficient that two-dimensional circuits.

Why? Because most of the power and time

taken up during computation is moving electrons.

They have to move from one place to another.

Pads in three-dimensions are much,

much shorter than a two-dimension.

So, if you can make a three-dimensional FPGA,

you can get much better performance.

But it's really hard to make a three-dimensional FPGA.

So, this fellow, Steve Teigen,

has a wonderful idea, this inside,

this man is brilliant and I knew like

30 seconds I wanted to work with this man.

He realized that I can implement

a three-dimensional FPGA by

implementing one dimension in time instead of space.

That's the point of dynamic reconfigurability.

So, you get much, much shorter pathways in space-time,

and the architecture was called

space-time, and it really,

honest to God, is based on

Minkowski's mathematics of relativity.

Minkowski used to clean up the math that Einstein used.

So, my job was, initially,

it would be FPGA applications

but eventually it'll be all computation.

How are we going to program this thing?

It's not going to be in any sequential language,

every sequential language is going to be deeply

acknowledged because this is massively parallel,

we want to be doing

millions of operations simultaneously.

Meaning, two billion, million operations per second.

So, it's not going to be any sequential language.

But a purely functional language

doesn't have this sequential bias.

So, how do we compile the purely functional language into

a three-dimensional FPGA? That was my job.

I hit on this solution which

is I realized one day that you

could describe circuits using

this language that I've shown you.

This language is a language

that can be used to describe circuits.

Hardware engineers don't do that but you could.

Then, I remembered, wow, this 1980 result that says

the type Lambda calculus is

equivalent to this vocabulary,

and Haskell is equivalent to the type Lambda calculus.

So, if I go from Haskell, put it through GFC,

and go to its internal language,

which is a very small Lambda calculus,

I take that internal language and

transform it into the language of categories,

and then instead of instantiating with

usual one by all computable functions,

I do the category of circuits

of massively parallel circuits.

Then, I represent that in terms of

the graphs as a rendering of that representation,

and then I generate Verilog code.

Eventually, we were going to do things that are

better than Verilog but that's where I started.

That really was my original motivation.

But then one day I realized because I'd

worked on ADL a while ago,

I realized that the categorical approach is a much,

much better way to think

about automatic differentiation, so.

All right. I think that's a good stopping place.

>> Yes, thank you.

>> I'll be around.

For more infomation >> The Simple Essence of Automatic Differentiation - Duration: 1:30:45.

-------------------------------------------

Simple English for Beginners: Drinking A Lot Of Water (With English Subtitles) - Duration: 5:46.

I drink a lot of water.

Hello everybody, how are you today? Fantastic. That's perfect. What's my name? Kyle, exactly. The favorite professor

of the Spanish speaking peoples of Latin America. And as you can see I, Profe Kyle,

drink a lot of water... a lot of water. What do I drink a lot of? I drink a lot

of water. Exactly. Do I drink a lot of pop/ soda? No, I don't drink a lot of pop. I

don't drink a lot of soda. I drink a lot of water. I don't drink a lot of pop

because pop/soda, it isn't very healthy for me. I do not

feel good after drinking pop/soda. Do I feel fantastic

after drinking lots of pop/soda? No, I don't feel fantastic after drinking

a lot of pop or soda. Honestly, I feel like crap,

I feel bad after drinking lots of pop/soda. There's too much sugar, too much sugar

in pop and soda. Is there too much water in pop? No, there isn't too

much water in pop. There is too much sugar in pop. And sugar... what makes me

feel bad? Exactly, the sugar is what makes me feel

bad. So, I prefer to drink lots of water... lots of water. Water is healthier. Water

is healthy for me. Is it worse for me than pop or healthier for me than pop/soda?

Water is healthier for me than pop/soda. Exactly. And you want to know

how much water I drink? I try to drink four liters. I try to drink four liters

of water every day. I try to drink one liter of water? No, I don't try to drink

one liter of water. Do I try to drink two liters of water? No, I don't try to drink

two liters of water. Do I try to drink 3 liters of water? No, I don't try to drink

3 liters of water. I try to drink 4 liters of water. Why? Because I want to be

healthier. I want to be healthier by drinking water...

water. And my water looks weird. It looks...

it has a different color. That's because I put stuff... I put stuff in my water. So,

does my water... is it a strange color? Yes, my water's a strange color. And why is it

a strange color? Because I put stuff in my water. You see I put some salt and I

put some electrolytes into my water and it makes it easier to drink. So, do I put

sugar in my water? No, I don't put sugar in my water. What do I put in my water? I put

salt and electrolytes in my water. Why do I put salt and electrolytes in my water?

It makes it easier to drink. It's easy to drink when there's salt and electrolytes

in my water.

Delicious water. So, do you drink a lot of water? I hope

you do. Or do you drink more pop/soda than you do water? Let me know down in

the comments please. That's all for today's simple English video remember to

like the video, subscribe to the channel and click on the bell button to receive

notifications. And as always see you soon my dear, dear student.

For more infomation >> Simple English for Beginners: Drinking A Lot Of Water (With English Subtitles) - Duration: 5:46.

-------------------------------------------

( cc )How to record a simple video on roblox! w/boston - Duration: 5:29.

Hey guys its CrazyKittyGamez And today im showing you guys how to record a video on roblox!

My friend didn't know how so I made this video for her and other people that want a free built in roblox recorder

So up near chat theres 3 bars and click on that

Then theres a menu and it will say at the top Players, Settings, Help, And the record

What we want is to click record.

Open the tab it should say screenshot and record you want to click record for me it said stop recording becuase I was already recording with this diviece

click tat and you start recording

There should be a red thing in the coner right next to chat or you may need to go back into the menu to stop recording but click those to stop recording your video

And to upload you go on youtube with your youtube account click the camra with a + in it and click the big box that says upload with a arow and then all you need to do after that is selecct the file if you make 2 videos then the video you made will be at the bottom or after the other video if that helps maybe cus I dont know how to ecsplain how to upload so well with the file part

I hope that was helpfull for you all

It will work in any game

on roblox

its free and also built into roblox

and yeah I hope that was helofull for you all and you will be able to use this

That person kept on trying to carry me -_-

❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜 ❤️ 🧡 💛 💚 💙 💜

𝓣𝓱𝓪𝓽𝓼 𝓼𝓸 𝓬𝓸𝓸𝓵 𝓪𝓷𝓭 𝓹𝓻𝓮𝓽𝓽𝔂 ❤️

(っ◔◡◔)っ W𝔢𝔩𝔭 𝔫𝔬𝔴 𝔦𝔪 𝔧𝔲𝔰𝔱 𝔤𝔬𝔦𝔫𝔤 𝔞𝔯𝔬𝔲𝔫𝔡 𝔱𝔥𝔢 𝔦𝔫𝔱𝔢𝔯𝔫𝔢𝔱 𝔱𝔯𝔶𝔦𝔫𝔤 𝔱𝔬 𝔣𝔦𝔫𝔡 𝔰𝔬𝔪𝔢 𝔰𝔱𝔲𝔣𝔣 𝔱𝔬 𝔭𝔞𝔱𝔠𝔥 𝔲𝔭 𝔱𝔥𝔢 𝔱𝔦𝔪𝔢 𝔞𝔫𝔡 𝔱𝔯𝔶𝔦𝔫𝔤 𝔱𝔬 𝔤𝔢𝔱 𝔰𝔬𝔪𝔢𝔱𝔥𝔦𝔫𝔤 𝔱𝔬 𝔰𝔞y (っ◔◡◔)っ

🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩

Lol well I hope you guys enjoyed the video

I hope this was helpful and im still waiting on 2 more people to vote on the pole so please do that.

leave a like on the video if this was helpful for you and also comment down bellow if you know how to loop music

❤️ 🧡 💛 💚 💙 💜 Hope you all liked the video! ❤️ 🧡 💛 💚 💙 💜 Subscribe for more! ❤️ 🧡 💛 💚 💙 💜 And just keep meowing my kittys bai!!!

Bai!!!! ( boston ) baii!!! and hi yt!

( waves bai )

:3 ❤️ you!

For more infomation >> ( cc )How to record a simple video on roblox! w/boston - Duration: 5:29.

-------------------------------------------

Learn colors and Kitty Five Colors song for kids / Simple kids song TV - Duration: 1:16.

Learn colors and kitty Five Colors for kids ? Simple kids song TV

For more infomation >> Learn colors and Kitty Five Colors song for kids / Simple kids song TV - Duration: 1:16.

-------------------------------------------

PHOTOS – Emily Ratajkowski : canon en petite robe moulante, un look simple et efficace pour braver l - Duration: 2:29.

Elle ne passe jamais inaperçue avec sa belle silhouette longiligne, sa chevelure brune de sirène et son visage poupon

Emily Ratajkowski, révélée il y a quelques années après son apparition très sexy dans le clip Blurred Lines de Robin Thicke, figure régulièrement dans les classements des femmes les plus belles du monde

Et avec raison. Le mannequin qui s'est mis au cinéma et à la télévision depuis ne cesse d'attirer l'attention

Particulièrement sur son compte Instagram où elle n'hésite pas à s'afficher, sensuelle, parfois dans le plus simple des appareils

Elle revendique en effet sa liberté et se positionne d'ailleurs en tant que féministe

Cette semaine, dans les rues de New York, elle a été vue dans un look simple et efficace dont il est bon de s'inspirer pour braver les fortes températures

La belle portait en effet une petite robe moulante, façon débardeur long, couleur orange avec des baskets imposantes, à l'allure rangers

Sporty et sexy à la fois, le parfait mix ! > Découvrez le look d'Emily Ratajkowski dans le diaporama ici

Crédits photos : bestimage

For more infomation >> PHOTOS – Emily Ratajkowski : canon en petite robe moulante, un look simple et efficace pour braver l - Duration: 2:29.

-------------------------------------------

Smashed Cucumber Salad Very Simple - Duration: 3:43.

Hello everyone welcome to my channel so I didn't really film an intro for

today's video because I have a feeling that the intro will end up being even longer

than the video itself So today I am going to show you guys how to make this

super simple Chinese salad called "拍黄瓜" so if you translate that into English

it literally means "smashed cucumber" This recipe takes only about 2 minutes to

make, it tastes very fresh so let's get started The three ingredients that

you are going to need are cucumbers, garlic, and salt. You can also use

sesame oil to give this dish even more flavors You are going to just literally

smash the cucumber with the side of your knife like what I am doing here

I guess people do that because this allows more of the inside surface area

to get exposed to the outside, so later on when you add in the salt and the

garlic the flavors will reach the entire cucumber quicker After smashing your

cucumber just cut it into bite-sized pieces and put them in a plate

And now for the garlic you can crush the garlic just like how I crushed the

cucumber but I am just going to use my newly arrived garlic press here

Lightly sprinkle with some salt if you like sesame oil now is also the time to

add some

After that just give your dish a very quick stir, and that's it. This is a

very simple dish but at the same time you can taste all those different layers

of flavors so there is that a freshness that comes from the cucumber and you can

taste the salt, the garlic, the sesame... If you guys liked my simple healthy and

yummy smash the cucumber please give this video a thumbs up and the please

subscribe if you'd like to see more videos from me. Thank you guys for

watching I will see you guys next time. Bye-bye! (>^ω^<)

For more infomation >> Smashed Cucumber Salad Very Simple - Duration: 3:43.

-------------------------------------------

FFVII Easy, Simple and Consistent Guards Skip Strat (Subtitled) - Duration: 0:41.

Step 1: Hold right and align Cloud's head/hair with the rock on the ground the same way I did.

Step 2: Hold run and go straight down till Cloud Stops. Then release run while still holding down.

Step 3: While holding down press left for cloud to clip a bit more inside the Guard.

Step 4: Tap right till a tiny bit of Cloud's foot apear in the right side of the guard. Then tap again to the right.

Step 5: Hold run + Down + Right for Cloud to sink to the bottom position.

Step 6: Hold left and tap R1 till cloud move a tiny bit, then press and hold down + right.

Step 7: Hold left and tap R1 3 times making Cloud to move a tiny bit for each tap.

Then hold run + down + right and Cloud will go straight to the bottom.

For more infomation >> FFVII Easy, Simple and Consistent Guards Skip Strat (Subtitled) - Duration: 0:41.

-------------------------------------------

The simple things - namo (feat. Hulk) - Duration: 4:13.

Someone says he wants to marry a girl like me.

What do you think?

- Someone like you? - Yes, he wants to marry a girl like me.

- What do you think? - I don't know.

Let me take a pic of you.

- Nayeon. - What took you so long?

- Sorry. Did I keep you waiting for long? - Sure, you did!

- Sorry. Are you mad at me? - Of course, I am.

What should I do to make up for it?

What do you want me to do?

What's that?

Nayeon is so cute. Cute Nayeon.

It's not over yet.

- Not over yet? - What do you want to eat? - Me?

I want to have Soft Tofu Stew.

Soft Tofu Stew? I will find the best Soft Tofu Stew.

Let's go and eat. Come here.

A girl like me? If you marry me

It's going to be fun. Everyday.

- Right? - Yes.

- Momo thinks it's fun to be with me. - Right, it's fun.

For more infomation >> The simple things - namo (feat. Hulk) - Duration: 4:13.

-------------------------------------------

4 Simple Steps to Answering Nursing Questions Correctly! - Duration: 11:00.

Critical thinking is key

Look at the question stem and put it into your OWN words

What is THE ONE THING I would do??

Label your answer choices as assessments or interventions

Try to rule out the answer choices labeled as assessments

Label the remaining answer choices as physiologic or psychosocial needs

Label the remaining choices according to the A-B-Cs

Complete blog post available at www.thedrnurse.com

Quick reference guide posted at www.thedrnurse.com under See What's New

For more infomation >> 4 Simple Steps to Answering Nursing Questions Correctly! - Duration: 11:00.

-------------------------------------------

Top 10 Simple and Safe Exercises During Pregnancy At Home - Duration: 4:39.

Top ten simple and safe exercises during pregnancy

One baby bump faith swimming as a mom-to-be. You're focused on doing everything. You can to have a healthy, baby

Exercise is good for both you and your little one

In fact, the right moves can ease common discomforts like back pain and sleep troubles

One of the best exercises is swimming

It's gentle on the joints and relieves swollen ankles and you'll feel light as a beach ball

No matter how big your baby bump check with your health care provider before starting or continuing

exercise in pregnancy

2 baby bump faith yoga yoga strengthens core muscles eases back pain and helps you relax and

Research shows that it may make labor shorter and more comfortable try a prenatal class

Which is gentler and focuses on relaxation

Good prep for labor avoid hot yoga and after your first trimester

Don't lie on your back. If something doesn't feel right check with a fitness expert

3 baby bump faith indoor cycling

Take a load off your legs

Cycling on a stationary bike is generally safe. Even if you are just starting an exercise program

Cycling is a good way to boost your heart rate without stressing your joints as your belly grouse

You can raise the handlebars for greater comfort

for baby bump faith weight training

Light strength training can help you stay toned before and after delivery if you were lifting weights before you got pregnant

Chances, are you can keep going as long as you go easy

Avoid heavy weights or routines where you have to lie flat on your back if you weren't strength training before you got pregnant

Find another exercise for now

5 baby bump faith brisk walking

Whether you're on a trail or a treadmill walking can safely help tone muscles and improve your mood

It's also something most women can do right up to delivery if you're just starting

Dry walking a semi Swift mile three days a week

Increase your time and speed a little each week and build in Hills as you get stronger

six baby bump faith low-impact aerobics

Aerobics keep your heart and lungs strong tone your body all over and give you a burst of endorphins a feel-good brain

chemical if you are a beginner look for a low-impact aerobics class taught by a certified aerobics teacher

7 modify high intensity sports if you regularly run or play tennis

You don't need to stop as you get closer to your due date run on flat groomed

surfaces to reduce impact and avoid spills

This may also be a good time to postpone racquet sports that require good balance and sudden changes in body position

8 fit dip abdominals

Exercises for your abs can ease an aching back and help fight a swayback posture that may develop as the uterus gets heavier

too safe options are

Kneeling pelvic tilt on all fours with a flat back tighten the ABS and gently arch your back to the ceiling

Don't let your belly sag down

Standing pelvic tilt back up to a wall with your feet three inches out

Tighten your stomach and buttocks and press your low back to touch the wall

9 fit tip stretching

Taylor stretches target pelvic hip and thigh muscles that they can also help lessen low back pain try these

Taylor sit sit with your knees bent and ankles crossed

Lean forward a little keeping your back straight

Taylor press sit with your knees bent and the bottoms of your feet together

Place your hands under your knees, press the knees down against your hands and your hands up against the knees

Hold a few seconds and release

ten feet dip kegels

the beauty of Kegel exercises is that you can do them anytime anywhere without anyone knowing

Kegels strengthen the muscles that help hold up the uterus bladder and bowels which helps labor and delivery

To do them squeeze the pelvic muscles as if you are trying to stop urinating or passing gas

Hold for 5 seconds and relax

Repeat 10 times 5 times a day

Youtube online news

Thứ Sáu, 27 tháng 7, 2018

News on Youtube Jul 27 2018

Không có nhận xét nào:

Đăng nhận xét